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Abstract 


We-describe^a  viewpoint  on  the  Dempster/Shafer  “Theory  of  Evidence”, 
and  provide  an  interpretation  which  regards  the  combination  formulas  as 
statistics  of  the  opinions  of  “experts'”!*  This  is  done  by  introducing  spaces 
with  binary  operations  that  are  simpler  to  interpret  or  simpler  to  implement 
than  the  standard  combination  formula,  and  showing  that  these  spaces  can 
be  mapped  homomorphically  onto  the  Dempster/Shafer  theory  of  evidence 
space.  The  experts  in  the  space  of  ^opinions  of  experts”  combine  informa¬ 
tion  in  a  Bayesian  fashion.  We-present  alternative  spaces  for  the  combina¬ 
tion  of  evidence  suggested  by  this  viewpoint. 


1.  Introduction 

Many  problems  in  artificial  intelligence  call  for  assessments  of  degrees  of  belief 
in  propositions  based  on  evidence  gathered  from  disparate  sources.  It  is  often 
claimed  that  probabilistic  analysis  of  propositions  is  at  variance  with  intuitive 
notions  of  belief  [1,2,3].  Various  methods  have  been  introduced  to  reconcile  the 
discrepancies,  but  no  single  technique  has  settled  the  issue  on  both  theoretical  and 
pragmatic  grounds. 

One  method  for  attempting  to  modify  probabilistic  analysis  of  propositions  is 
the  Dempster/Shafer  ‘Theory  of  Evidence.”  This  theory  is  derived  from  notions  of 
upper  and  lower  probabilities,  as  developed  by  Dempster  in  [4].  The  idea  that 
intervals  instead  of  probability  values  can  be  used  to  model  degrees  of  belief  had 
been  suggested  and  investigated  by  earlier  researchers  [3, 6, 2, 7],  but  Dempster’s 
work  defines  the  upper  and  lower  points  of  the  intervals  in  terms  of  statistics  on 
set- valued  functions  defined  over  a  measure  space.  The  result  is  a  collection  of 
intervals  defined  for  subsets  of  a  fixed  labeling  set,  and  a  combination  formula  for 
combining  collections  of  intervals. 

Dempster  explained  in  greater  detail  how  these  notions  could  be  used  to  assess 
beliefs  on  propositions  in  [8].  The  topic  was  taken  up  by  Shafer  [9, 10],  and  led  to 
publication  of  a  monograph  on  the  ‘Theory  of  Evidence,”  [11].  All  of  these  works 
after  [8]  emphasize  the  values  assigned  to  subsets  of  propositions  (the  “beliefs”), 
and  the  combination  formulas,  and  de-emphasize  the  connection  to  the  statistical 
foundations  based  on  the  set-valued  functions  on  a  measure  space.  This  paper  will 
relate  the  statistical  foundations  of  the  Dempster/Shafer  theory  of  evidence  to 
notions  of  beliefs  on  propositions. 

The  Dempster/Shafer  theory  of  evidence  has  sparked  considerable  debate 
among  statisticians  and  “knowledge  engineers”.  The  theory  has  been  criticized  and 
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debated  in  terms  of  its  behavior  and  applicability,  e.g.  [12,13,8  (Commentaries  fol¬ 
lowing)].  Some  of  the  questions  have  been  answered  by  Shafer  [14, 15],  but  discus¬ 
sion  of  the  theoretical  underpinnings  continues,  e.g.  [1, 16,3]. 

Recently,  there  has  been  increased  interest  in  the  use  of  the  Dempster/Shafer 
theory  of  evidence  in  expert  systems  [17, 18].  Most  of  the  recent  attempts  to  map 
the  theory  to  real  applications  and  practical  methods,  such  as  described  in 
[19,20,21,22,23],  are  based  on  the  techniques  described  by  Shafer  [11],  and  disre¬ 
gard  the  statistical  theoretical  foundations  from  which  the  theory  was  derived.  In 
this  paper  we  present  a  viewpoint  on  the  Dempster/Shafer  theory  of  evidence  that 
regards  the  theory  as  statistics  of  opinions  of  “experts”.  We  relate  the  evidence- 
combination  formulas  to  statistics  of  experts  who  perform  Bayesian  updating  in 
pairs.  Finally,  we  suggest  a  related  formulation  that  leads  to  simpler  formulas  and 
fewer  variables. 

Recently,  the  authors  have  pointed  out  that  the  Dempster/Shafer  theory  of  evi¬ 
dence  is  one  technique  in  a  large  class  of  iterative  knowledge  aggregation  methods 
[24].  These  methods,  which  include  relaxation  labeling  [25],  stochastic  relaxa¬ 
tion  [26]  and  neural  models  [27],  always  attempt  to  find  a  true  labeling  by  updating 
a  state  as  evidence  is  accumulated.  In  the  theory  of  evidence,  as  in  many  other 
models,  the  true  labeling  is  one  of  a  finite  number  of  possibilities,  but  the  state  is  a 
collection  of  numbers  describing  an  element  in  a  continuous  domain.  In  the  Shafer 
formulation,  the  state  of  the  system  is  described  by  a  distribution  over  the  set  of  all 
subsets  of  the  possible  labels.  That  is,  each  subset  A  of  labels  has  assigned  to  it  a 
number  representing  a  probability  that  the  subset  of  possible  labels  which  are  still 
possible  based  on  the  evidence  is  precisely  A  (see,  e.g.,  [28]).  Implicit  in  this  model 
is  the  notion  that  an  incremental  piece  of  evidence  carries  a  certain  amount  of 
weight  or  confidence,  and  distinguishes  a  subset  of  possibilities.  Evidence  may 
point  to  a  single  inference  among  the  set  of  labels,  or  may  point  to  a  subset  of  the 
alternatives.  Further,  the  Dempster/Shafer  theory  insists  that  no  mass  is  placed  on 
the  empty  set,  reflecting  the  assumption  that  the  label  set  is  exhaustive,  so  that  at 
least  one  label  must  be  correct. 

As  evidence  is  gained,  masses  are  updated  according  to  a  combination  formula. 
The  effect  of  an  incremental  bit  of  information  pointing  to  a  particular  subset  A  is  to 
transfer  partial  mass  from  sets  to  subsets  defined  by  intersection  with  A.  However, 
mass  moved  from  subsets  that  are  disjoint  from  A  to  the  empty  set  is  redistributed 
evenly  among  all  other  subsets.  Thus,  new  evidence  typically  concentrates  mass  in 
low-order  subsets,  moving  mass  into  subsets,  except  that  mass  directed  to  the  empty 
set  is  recirculated  to  all  nonempty  subsets.  The  combination  formula  is  commuta¬ 
tive  and  associative,  so  a  succession  of  incremental  changes  can  be  combined  into  a 
single  state  that  can  be  regarded  as  a  non-primitive  updating  element. 

Shafer  defines  the  belief  on  a  subset  of  possibilities  A  to  be  the  sum  of  the 
masses  which  are  applied  to  subsets  of  A.  This  quantity  represents  a  belief  in  the 
statement  “The  truth  lies  in  A”,  and  corresponds  to  the  “lower  probability”  in 
Dempster’s  formulation.  The  highest  degree  of  support  that  the  evidence  provides 
for  a  subset  A  is  the  amount  of  mass  that  can  move  to  a  subset  of  A,  and  thus  is  a 
sum  of  masses  on  subsets  that  meet  A.  These  values,  called  “plausibilities”,  are  the 
“upper  probabilities”  defined  by  Dempster.  Finally,  Shafer  defines  the  “commonal¬ 
ity  numbers”  of  a  subset  A  as  the  sum  of  masses  on  subsets  which  contain  A. 
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Commonality  numbers  represent  the  total  amount  of  mass  that  is  available  to  move 
to  the  entire  subset  A.  Shafer  shows  that  the  mass  function,  belief  function,  plausi¬ 
bility  numbers,  and  commonality  numbers  are  all  equivalent  formulations  of  a  body 
of  evidence,  and  that  each  can  be  derived  from  any  other  [11]. 

The  Dempster/Shafer  theory  of  evidence  differs  from  Bayesian  analysis  in 
several  important  respects.  First,  beliefs  are  applied  not  only  to  the  singleton 
labels,  but  also  to  sets  of  labels  by  a  non-additive  set  function.  This  increase  in  the 
dimensionality  of  the  space  of  states  permits  distinctions  in  the  different  types  of 
evidence  that  can  be  represented.  For  example,  lack  of  evidence  can  be  represented 
in  the  theory  of  evidence  by  withholding  mass  to  the  entire  set  of  possibilities, 
whereas  conflicting  evidence  is  denoted  by  placing  large  mass  on  disjoint  subsets. 
Second,  incremental  evidence  can  take  a  fairly  complex  form,  indicating  a  subset  of 
possibilities  without  expressing  preferences  within  the  subset.  Finally,  the  combina¬ 
tion  operation  is  considerably  more  complex  than  termwise  multiplication  of  proba¬ 
bilities,  unless  all  mass  is  concentrated  in  singleton  subsets.  In  this  way,  the  combi¬ 
nation  formula  seems  to  extend  Bayesian  updating. 

Recently,  Kyburg  [3]  has  shown  how  to  view  the  Dempster/Shafer  theory  of 
evidence  in  terms  of  a  collection  of  probabilistic  opinions  over  the  label  set.  His 
viewpoint  is  similar  to  the  one  here,  in  that  a  state  is  represented  by  a  set  of  opin¬ 
ions.  However,  whereas  we  view  the  beliefs  as  statistics,  Kyburg  interprets  these 
numbers  as  extrema  over  the  collection  of  experts.  Accordingly,  although  he  con¬ 
trasts  the  combination  formula  to  Bayesian  updating  on  the  set  of  probabilistic  opin¬ 
ions,  the  two  viewpoints  are  different. 

In  general,  the  viewpoint  that  masses  and  other  numbers  are  assigned  to  sub¬ 
sets  of  labels  obscures  the  statistical  basis  on  which  the  upper  and  lower  probability 
analysis  is  based.  In  Dempster’s  earlier  work  [4],  however,  the  set-valued  functions 
are  defined  over  measure  spaces,  which  can  each  be  viewed  as  a  probability  space 
yielding  subsets  of  labels  for  each  sample.  In  this  paper,  we  return  to  the  earlier 
Dempster  model  of  measures  on  a  measure  space,  and  relate  those  notions  to  spaces 
of  “experts”  with  opinions  expressed  as  subsets  of  possibilities.  This  portion  of  our 
formulation  appears  completely  and  in  greater  generality  in  [4],  although  we  hope  to 
make  the  connection  to  pragmatic  application  issues  more  explicit. 

This  paper  has  three  main  points.  First,  we  show  that  the  combination  rule  for 
the  Dempster/Shafer  theory  of  evidence  may  be  simplified  by  omiting  the  normaliza¬ 
tion  term.  We  next  point  out  that  the  individual  pairs  of  experts  involved  in  the 
combination  formula  can  be  regarded  as  performing  Bayesian  updating.  Finally,  we 
present  extensions  to  the  theory,  based  on  allowing  experts  to  express  probabilistic 
opinions  and  assuming  that  the  logarithms  of  experts’  opinions  over  the  set  of  labels 
are  multinormally  distributed. 

2.  The  Rule  of  Combination  and  Normalization 

The  set  of  possible  outcomes,  or  labelings,  will  be  denoted  in  this  paper  by  A. 
This  set  is  the  “frame  of  discernment”,  and  in  other  works  has  been  denoted,  vari¬ 
ously,  by  ft,  ©,  or  S.  For  convenience,  we  will  assume  that  A  is  a  finite  set  with  n 
elements,  although  the  framework  could  easily  be  extended  to  continuous  label  sets. 
More  importantly,  we  will  assume  that  A  represents  a  set  of  states  that  are  mutually 
exclusive  and  exhaustive.  If  A  is  not  initially  exhaustive,  it  can  easily  be  made  so  by 
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including  an  additional  label  denoting  “none  of  the  above.”  If  A  is  not  mutually 
exclusive,  it  can  be  made  so  by  replacement  with  its  power  set  (i.e.,  the  set  of  ail 
subsets),  so  that  each  subset  represents  the  occurrence  of  exactly  that  subset  of 
labels,  excluding  all  other  labels.  Of  course,  replacing  A  by  its  power  set  is  peri¬ 
lous,  in  that  it  will  greatly  expand  the  cardinality  of  the  label  set.  For  practical 
applications,  the  implementer  is  more  likely  to  want  to  replace  A  by  the  set  of  all 
plausible  subsets  describing  a  valid  configuration. 

An  element  (or  state  of  belief)  in  the  theory  of  evidence  is  represented  by  a 
probability  distribution  over  the  power  set  of  A,  P(A).  That  is,  a  state  m  is 

m  :  P(A)  -*  [0,1], 

2  m(A)  -  1. 

A£A 

There  is  an  additional  proviso  that  is  typically  applied,  namely  that  every  state  m 
satisfies 

m(0)  ■  0. 

Section  3.2  introduces  a  plausible  interpretation  for  the  quantities  comprising  a 
state. 

A  state  is  updated  by  combination  with  new  evidence,  or  information,  which  is 
presented  in  the  form  of  another  state.  Thus  given  a  current  state  mi,  and  another 
state  m2,  a  combination  of  the  two  states  is  defined  to  yield  a  state  mi  ©  m2  given 
by 

2  mi(fl)m2(C) 

<m,  ©mjJM)  -  j  'nC'^  m,(8)m2(c)  if  A’10-  0» 

enc-0 

and 

(mi  ©  m2)(0)  *  0. 

This  is  the  so  called  “Dempster  Rule  of  Combination.”  Note  that  the  resulting 
function  m  is  a  probability  mass  due  to  the  normalization  factor,  and  that 
(mi  ©  m 2>(0)  =  0  by  definition. 

The  problem  with  this  definition  is  that  the  denominator  in  (la)  might  be  zero, 
so  that  (mi  ©  m2)(A)  is  undefined.  That  is,  there  exist  pairs  mi  and  m 2  such  that 
the  combination  of  mi  and  m2  is  not  defined.  This,  of  course,  is  not  a  very  satisfac¬ 
tory  situation  for  a  binary  operation  on  a  space.  The  solution  which  is  frequently 
taken  is  to  avoid  combining  such  elements.  An  alternative  is  to  add  an  additional 
element  mo  to  the  space: 

mo(A)  =  0  for  A  #  0, 
mo(0)  =  1. 

Note  that  this  additional  element  does  not  satisfy  the  condition  m(0)  =  0.  Then 
define,  as  a  special  case. 
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«ltS«2  =  «o  if  2  rn\(B)mi(C)  =  1.  (lb) 

anc-0 

The  binary  operation  is  then  defined  for  all  pairs  mi,  m2.  The  special  element  mo 
is  an  absorbent  state,  in  the  sense  that  mo©m  =  m©mo  =  m 0  for  all  states  m. 

This  space  has  an  identity  element.  The  identity  state,  m/,  represents  complete 
ignorance,  in  that  combination  with  it  yields  no  change,  (i.e.,  m/©m  =  m©m/  =  m, 
for  all  states  m).  This  state  places  full  mass  on  the  subset  which  is  all  of  A, 

m/(  A)  =  1 


mj(A)  =  0  for  A  #  A. 


Definition  1:  We  define  (A4,©),  the  space  of  belief  states,  by 

M  =  {m  :  P(A)  -1R  +  U{0}|  2  m(A)  =  1,  m(0)  =  0}  U  {m0}, 

AQ  A 

and  define  ®  by  (la)  when  the  denominator  in  (la)  is  nonzero,  and  by  (lb)  other¬ 
wise.  ■ 

The  set  M,  together  with  the  combination  operation  ©,  constitutes  a  monoid, 
since  the  binary  operation  is  closed  and  associative,  and  there  is  an  identity  ele¬ 
ment.1  In  fact,  the  binary  operation  is  commutative,  so  we  can  say  that  the  space  is 
an  abelian  monoid. 


Still,  because  of  the  normalization  and  the  special  case  in  the  definition  of  ©, 
the  monoid  M  is  both  ugly  and  cumbersome.  It  makes  better  sense  to  dispense  with 
the  normalization.  We  have 


Definition  2:  We  define  (M  ',©'),  the  space  of  unnormalized  belief  states ,  by 
M'  -  {m  :  P(A)  -1R+  U  {0}  |  2  m(A)  -  1} 

ACA 

without  the  additional  proviso,  and  set 


(m1©'m2)(A)  =  2  mi(fl)m2(C)  VAC! A  /2) 

anc=A 

for  all  pairs  mi,mj(M'.  ■ 

One  can  verify  that  miQ'miiM' ,  and  that  ©'  is  associative  and  commutative. 
Further,  the  same  element  m/  defined  above  is  also  in  M',  and  is  an  identity.  Thus 
M'  is  also  an  abelian  monoid.  Clearly,  M'  is  a  more  attractive  monoid  than  M. 

We  define  a  transformation  V  mapping  M'  to  M  by  the  formulas 

<Vm)(A)  ■  <3) 


if  m(0)  *  1,  and 


(V/fi)(0)  =  0 


Vm  =  mo 


‘A  structure  with  •  closed  associative  binary  operation  is  sometimes  call  a  semigroup,  so  that  the  space  in 
question  is  an  abelian  semigroup  with  an  identity. 
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otherwise. 

A  computation  shows  that  V  preserves  the  binary  operation;  i.e., 

V(mi®'m2)  =  V(mi)®V(m2). 

Thus  V  is  a  homomorphism ?  Further,  V  is  onto,  since  for  m€M,  the  same  m  is  in 
M' ,  and  Vm  =  m.  The  algebraic  terminology  is  that  V  is  an  epimorphism  of 
monoids,  a  fact  which  we  record  in 

Lemma  1:  V  maps  homomorphically  from  (A*',®')  onto  (.M,©).  ■ 

A  “representation”  is  a  term  that  refers  to  a  map  that  is  an  epimorphism  of 
structures.  Intuitively,  such  a  map  is  important  because  it  allows  us  to  consider 
combination  in  the  space  formed  by  the  range  of  the  map  as  combinations  of  preim¬ 
age  elements.  Lemma  1  will  eventually  form  a  small  part  of  a  representation  to  be 
defined  in  the  next  section.  In  the  case  in  point,  however,  if  it  is  required  to  com¬ 
bine  elements  in  M ,  one  can  perform  the  combinations  in  M ' ,  and  project  to  M  by 
V  after  all  of  the  combinations  are  completed.  Since  combinations  in  M '  are  much 
cleaner,  this  is  a  potentially  useful  observation.  In  terms  of  the  Dempster /Shafer 
theory  of  evidence,  this  result  says  that  the  normalization  in  the  combination  for¬ 
mula  is  essentially  irrelevant,  and  that  combining  can  be  handled  by  Equation  (2). 
Specifically,  given  a  sequence  of  states  in  M  to  be  combined,  say  mi,m2,  *  *  •  ,  mk, 
we  can  regard  these  states  as  elements  in  M'.  Since  each  mt  satisfies  m((0)  =  0, 
they  each  satisfy  Vmt  =  m*.  Thus 

V(mi©'m2®'  •  •  •  ®'m*)  =*  Vmi®  •  •  •  ®Vm*  =  mi®  •  •  •  ©m*, 

which  says  that  it  suffices  to  compute  the  combinations  using  ©'  (Equation  (2)),  and 
then  project  by  V  (Equation  (3)).  Of  course,  the  final  projection  is  necessary  only 
if  we  absolutely  insist  on  a  result  in  M.  If  any  more  combining  is  to  be  done,  or  if 
we  are  reasonably  broad-minded,  intermediate  results  can  be  interpreted  directly  as 
elements  in  M ' . 

3.  Spaces  of  Opinions  of  Experts 

In  this  section,  we  introduce  two  new  spaces,  based  on  the  opinions  of  sample 
spaces  of  experts,  and  discuss  the  evaluation  of  statistics  of  experts  opinions. 
Finally,  we  interpret  the  combination  rules  in  these  spaces  as  being  a  form  of  Baye¬ 
sian  updating.  In  the  following  section  we  will  show  that  these  spaces  also  map 
homomorphically  onto  the  space  of  belief  states. 

3.1.  Opinions  of  Experts 

We  consider  a  set  £  of  “experts”,  together  with  a  map  p.  giving  a  weight  or 
strength  for  each  expert.  It  is  convenient  to  think  of  £  as  a  large  but  finite  set, 
although  the  essential  restriction  is  that  £  should  be  a  measure  space.  Each  expert 
a ii£  maintains  a  list  of  possible  labels:  Dempster  uses  the  notation  f(o))  for  this 


’Strictly  speaking,  this  merely  shows  that  V  is  a  homomorphism  of  semigroups;  it  is  not  hard  to  show  that 
V  maps  the  identity  to  the  identity,  which  it  must  since  it  is  onto,  and  thus  it  is  also  a  homomorphism  of 
monoids. 
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subset;  i.e.,  r(o>)CA.  Here  we  will  assume  that  each  expert  co  has  more  than  just  a 
subset  of  possibilities  r((o),  but  also  a  probabilistic  opinion  pu,  defined  on  A  satisfy¬ 
ing 

pJXjaO,  VX€A 
p<a(X)> 0  iff  X€r(o>), 


(2pu(X)  =  1  or  pM(X)  =  0VX),  Va>€£. 

X€  A 

As  suggested  by  the  notation,  pM(X)  represents  expert  co’s  assessment  of  the  proba¬ 
bility  of  occurrence  of  the  label  X.  If  an  expert  a>  believes  that  a  label  X  is  possible, 
i.e.,  X€r(o>),  then  the  associated  probability  estimate  pM(X)  will  be  nonzero.  Con¬ 
versely,  if  to  thinks  that  X  is  impossible  (X£r(o>)),  then  pu(X)  =  0.  We  also 
include  the  possibility  that  expert  o>  has  no  opinion  which  is  indicated  by  the  special 
element  pa  m  0.  This  state  is  included  in  order  to  ensure  that  the  binary  operation, 
to  be  denned  later,  is  closed.  We  denote  the  collection  of  maps  {pu  |  a)€£}  by  P. 

It  will  turn  out  that  the  central  point  in  the  theory  of  evidence  is  that  the  pw(X) 
data  is  used  only  in  terms  of  test  for  zero.  Specifically,  we  set 


.  =  (1  if  p„(X)  >  0 

:“C  }  (0  if p»(X)  =  0. 


Note  that  xu  is  the  characteristic  function  of  the  set  r(a>)  over  A,  i.e., 
xM(X)  =  1  iff  X€r(oi).  The  collection  of  all  xM’s  will  be  denoted  by  X,  and  will  be 
called  the  boolean  opinions  of  the  experts  £. 

If  we  regard  the  space  of  experts  £  as  a  sample  space,  then  each  xu(X)  can  be 
regarded  as  a  sample  of  a  random  (boolean)  variable  x(X).  In  a  similar  way,  the 
p«(X)’s  are  also  samples  of  random  variables  p(X).  The  state  of  the  system  will  be 
defined  by  statistics  on  the  set  of  random  variables  {x(X)}xca-  These  statistics  are 
measured  over  the  space  of  experts.  If  all  experts  have  the  same  opinion,  then  the 
state  should  describe  that  set  of  possibilities,  and  the  fact  that  there  is  a  unanimity  of 
opinion.  If  there  is  a  divergence  of  opinions,  the  state  should  record  the  fact. 

To  compute  statistics,  we  will  simply  sum  the  weights  of  experts  in  subsets  of 
£.  If  the  experts  have  equal  weights,  this  is  equivalent  to  counting  the  number  of 
experts.  In  general,  we  will  sum  the  weights  of  experts  in  a  subset  T,  and  denote 
the  result  by  Thus  p.  is  in  fact  a  measure  on  £,  although  it  is  completely 

determined  by  the  weights  of  the  individual  experts  for  “>€£.  (We  are 

assuming  that  £  is  finite.)  That  is, 

M'(-^)  =  2  »*({“»})• 

»€*■ 

It  is  important  to  observe  that  these  measures  are  evaluated  on  subsets  of 
experts,  and  not  on  the  subsets  of  A.  The  m(A )  values  which  show  up  in  Shafer’s 
work  are  applied  to  subsets  of  the  frame  of  discernment  A,  but  are  related  to  the 
measures  p.  defined  on  subsets  of  experts,  as  we  will  presently  show.  The  measures 
p.  show  up  in  Dempster’s  original  work  on  upper  and  lower  probabilities,  however, 
and  are  the  basis  for  our  presentation  that  follows.  In  fact,  Dempster  treats  a  more 
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general  case  where  p  can  be  a  measure  defined  on  a  Borel  class  of  subsets  of  an 
infinite  space  of  experts  £.  For  our  purposes,  it  suffices  to  consider  measures  on 
finite  sets  of  experts.  It  is  nearly  sufficient  to  consider  nothing  more  than  counting 
measures  on  finite  sets  of  experts  (experts  equally  weighted),  although  we  defer  an 
explanation  of  this  point  until  the  end  of  the  Section  4. 

We  are  now  ready  to  introduce  the  spaces  which  we  will  term  “opinions  of 
experts.”  The  central  point  is  that  the  set  of  labels  A  is  fixed,  but  that  the  set  of 
experts  £  can  be  different  for  distinct  elements  in  these  spaces.  For  the  first  space, 
we  also  require  a  fixed  set  of  positive  constants  ,  one  for  each  label. 

Definition  3:  Let  K  =  {k\}  be  a  set  of  positive  constants  indexed  over  the  label  set 
A.  The  space  of  probabilistic  opinions  of  experts  (Af,K,<2>),  is  defined  by 

Af  =  {(£,p.,F)  I  #£  <  «,  jjl  is  a  measure  on  £,  P-{pm}mie  , 


/>«,  :  A  -  [0,1]  Vo),  and  Vu,  2  P<*(M  -  1  or  pm  ■  0  }. 

X€A 


As  noted  earlier,  the  requirement  that  #£  <  00  is  for  clarity  of  presentation;  Demp¬ 
ster  defines  the  space  Af  in  a  more  general  setting. 

We  define  a  binary  operations  on  Af  as  follows.  Given  (Si.m.Pi)  and 
(£2 ,  Pit  Pi)  elements  in  Af,  define 

(£,  p.,F)  -  (£1,  p.i,Pi)  ®  (£2,  V-i.Pi) 
by 


£-  £\^£i  “  {(<**1 ,0*2)  |  &>i€£i,  i«>2€£2}. 


^({(co!,^)})  =  P-l({*»l})  M'2({«*>2}). 

and 

P  ~  {F(ui,<i»2)}(»i,«i»2){£  * 

_  pLVcmpLVcmlkx]'1 

X' 

providing  the  denominator  is  nonzero,  and 

P  (uiiU2)  m  0 

otherwise.  Here,  F,  =  {p^j  }01,i£l  for  /  =  1,2,  and  the  k^’s  are  a  fixed  set  of  positive 
constants  defined  for  A.  ■ 

To  interpret  this  combining  operation,  consider  two  sets  of  experts  £\  and  £2, 
with  each  set  of  experts  expressing  opinions  in  the  form  of  F 1  and  F 2.  We  form  a 
new  set  of  experts,  which  is  simply  the  set  of  all  committees  of  two,  consisting  of 
one  expert  from  £\,  and  another  from  £2.  In  each  of  the  committees,  the  members 
confer  to  determine  a  consensus  opinion.  In  Section  3.3,  we  will  see  how  to  inter¬ 
pret  the  formulas  as  Bayesian  combination  (where  k\  is  the  prior  probability  on  A). 
And  in  the  following  section  we  will  show  that  this  space  maps  homomorphically 
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onto  the  belief  spaces.  Finally,  if  as  in  Dempster  [4],  we  only  regard  the  opinions 
of  these  experts  in  terms  of  a  test  for  zero  (i.e.  disregarding  the  strength  of  nonzero 
opinions),  we  arrive  at  yet  another  space.  A  depiction  of  the  combination  of  two 
Boolean  opinions  is  shown  in  Figure  1. 

Definition  4:  The  space  of  boolean  opinions  of  experts,  (.V ,  •),  is  defined  similarly 
Af'  =  {(£,jjl,X)|  #£  <  «,  fi  is  a  measure  on  £, 


X  =  {xul}u£f  ,  :  A  -{0,1}  Vu)}. 

If  (£\,  m.XO  and  (£2 ,  \i.2>^2)  are  elements  in  Af* ,  define  their  product 

(£,  y.,X)  =  (A.fli.Xi)  0(£2,H2.*2) 


by 


£  —  £\  x  £2  —  {(“>1 ,0)2)  |  a>i  €  £\,  u>2  £  £2} 


and 


H({(a>i,u>2)})  ®  M-l({t»>l}) * M-2({t*>2}) . 


*(«1,«2)(M  =  xW(\)-X$(k), 
where  X,  =  {xty  |<d,€£,},  for  i  =  1,2.  ■ 
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3.2.  Statistics  of  Experts 

For  a  given  subset  A£A,  the  characteristic  function  Xa  is  defined  by 


Xa(A) 


fo  if; 

\l  * 


Equality  of  two  functions  defined  on  A  means,  of  course,  that  the  two  functions 
agree  for  all  A€ A.  That  is,  xm  =  Xa  means 

x.(X)-Xa(M  VASA, 
which  is  the  same  thing  as  saying  r(o>)  =  A. 

Given  a  space  of  experts  £  and  the  boolean  opinions  X,  we  define 

-  |  =  Xa> 

mW - m  (5) 

for  every  subset  AQ A.  It  is  possible  to  view  the  values  as  probabilities  on  the  ran¬ 
dom  variables  {x(A)}.  We  endow  the  elements  of  £  with  the  prior  probabilities 
H,({ti>})/ji.(£),  and  say  that  the  probability  of  an  event  involving  a  combination  of  the 
random  variables  x(A)’s  over  the  sample  space  £  is  the  probability  that  the  event  is 
true  for  a  particular  sample,  where  the  sample  is  chosen  at  random  from  £  with  the 
sampling  distribution  given  by  the  prior  probabilities.  This  is  equivalent  to  saying 

„  ,  m.({<d££  |  Event  is  true  for  &>}) 

Prob(Event)  =  -“Jl - 1 - r-. - u~. 

e  n{£} 

With  this  convention,  we  see  that 


Prob(Event)  = 
e 


m(A )  =  Prob(x(A)  =  Xa(A)  for  all  X). 

In  fact,  all  of  the  priors  and  joint  statistics  of  the  x(X)’s  are  determined  by  the  full 
collection  of  m(A)  values.  For  example, 

Prob(*(A0)  =  1  )  =  2  «(A) 

[A  |XqM1 


Prob<x(Ao)  =  1  and  x(Xj)  =  1  )  =  2  m(A). 

{a|x0-M<a} 

Further,  the  full  set  of  values  m(A)  for  AC  A  defines  an  element  m^M’ .  To 
see  this,  it  suffices  to  check  that  ^m{A)  =  1,  which  amounts  to  observing  that  for 
every  gj,  =  Xa  for  some  AC  A. 

Recalling  the  definition  of  V  (Equation  (3)),  we  may  also  consider  the  numbers 
(\m)(A).  These  values  can  also  be  interpreted  as  probabilities,  providing  we  define 
probability  in  a  way  which  ignores  experts  who  give  no  possibilities,  and  providing 
there  are  some  experts  who  give  some  possibilities,  (i.e.,  m(0)  #  1).  Then  for 
A*0, 


m(A)  =  (Vm)(A)  -  v 1  ■ 

1  —  m(0) 

is  the  probability  that  a  randomly  chosen  expert  o>  will  state  that  the  subset  of 
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possibilities  is  precisely  A  conditioned  on  the  requirement  that  the  expert  gives  at 
least  one  possibility. 

Under  the  assumptions  that  A  #  0,  m(0)  #  1,  and  that  probability  is  meas¬ 
ured  over  the  set  of  experts  expressing  an  opinion  £'  =  {a)|xu#  0},  many  of  the 
quantities  in  the  theory  of  evidence  can  be  interpreted  in  terms  of  familiar  statistics 
on  the  x(X)’s.  For  example,  the  belief  on  a  set  A, 

Bel(A)  =  2  «(*> 

BQA 

is  simply  the  joint  probability 

Bel(A)  =  Prob(jr(X)  =  0  for  X*A). 

Note  that  the  prior  probabilities  on  the  experts  in  £'  are  given  by  p,({u>})/|x(f'). 
The  denominator  in  these  priors  is  nonzero  due  to  the  assumption  that  m(0)#l. 

In  a  similar  way,  plausibility  values 

P1(A)  =  2  "»(£)  =  l-Bel(A) 

can  be  interpreted  as  disjunctive  probabilities 

P1(A)  =  Prob(x(X)  =  1  for  some  X€A). 

The  beliefs  and  plausibilities  are  the  lower  and  upper  probabilities  as  defined  by 
Dempster.  The  commonality  values 

(2(A)  =  2  *(*) 

AQB 

are  joint  probabilities: 

(2(A)  -  Prob(x(X)  =  1  for  X€A). 

To  recapitulate,  we  have  defined  a  mapping  from  P  values  to  X  values,  and 
then  transformations  from  X  to  m  and  m  values.  The  resulting  element  m,  which 
contains  statistics  on  the  X  variables,  is  an  element  in  the  space  of  belief  states  M  of 
the  of  the  Dempster/Shafer  theory  of  evidence  (Section  2). 

3.3.  Bayesian  Interpretation 

We  now  interpret  the  manner  in  which  pairs  of  experts  achieve  a  consensus 
opinion.  We  will  show  that  the  combination  formulas  given  for  A/’ and  Af'  are  con¬ 
sistent  with  a  Bayesian  interpretation.  Our  treatment  is  standard. 

We  first  consider  the  combination  of  (£i,m.i,Pi)  and  (£i,v>i,Pi)  in  A/\  We 
assume  that  the  experts  in  £j  have  available  to  them  information  Sj.  Note  that  all 
experts  in  a  given  set  of  experts  share  the  same  information.  The  information  Sj 
consists  of  boolean  predicates  constituting  evidence  about  the  labeling  situation.  For 
example,  in  a  medical  diagnosis  application,  sj  might  consist  of  a  statements  about 
the  presence  or  absence  of  a  set  of  symptoms.  Each  set  of  experts  £j  deals  with  a 
different  set  of  symptoms. 
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In  genera],  the  information  sj  is  the  result  of  a  set  of  tests  having  boolean  out¬ 
comes.  We  could  write  sj  -  //(a),  where  fj  represents  the  tests,  and  a  is  the 
current  situation  which  is  an  element  in  some  sample  space  of  labeling  problems 
a€2.  Assuming  2  is  also  a  measure  space,  there  are  prior  probabilities  on  the 
information  coefficients: 


Prob(jj)  -  Prob(/}(<r)  -  sj). 

There  are  also  prior  probabilities  on  the  true  label  X(a)  for  labeling  situation  a, 
given  by 

Prob(X)  *  Prob(X(«r)  *  X). 

ail 


Note  that  these  probabilities  are  not  measured  over  the  space  of  experts  £,  but 
instead  are  measured  over  the  collection  of  instances  2  of  the  labeling  problem. 
For  example,  in  a  medical  diagnosis  domain,  2  might  represent  the  set  of  all 
patients. 

For  j  —  1,2,  we  will  suppose  that  ptyO<)  represents  expert  utft  estimate  of 


Prob(X|^), 


the  probability  (over  2)  that  X(a)  -  X  conditioned  on  «  sj.  The  “expert” 
(o>i,b>2)  should  then  estimate  Prob(X|si,J2)>  which  is  the  probability  that  X(a)  *  X 
given  that  /i(o)  —  jr  j  and  /2(<r)  —  *2.  thus  combining  the  two  bodies  of  evidence 
seen  by  the  two  experts  in  that  committee.  This  committee  proceeds  as  follows: 


Bayes’  formula  implies  that 

. .  Prob(X) -Prob(,r  i,J2 1 M 

Pr0b(M,*’,j)  "  - Prob(n.Ji) - 


Prob(X)Prob(Ji  |X)Prob(j2ki,X) 
Prob(r1,j2) 


Applying  Bayes’  formula  to  Prob(ji |X),  this  becomes 


Prob(ri) 

Prob(jlfj2) 


•Prob(Xlji)Prob(r2|r  j,X) 


(6) 


At  this  point  that  we  assume  that 


Prob(j2  l-M.X)  =  Prob(j2|X). 


(7) 


Using  this  assumption,  we  obtain  by  combining  (6)  and  (7),  and  applying  Bayes' 
formula  to  Prob(j2|X), 


Prob(X|ji,j2) 


c(s  i  ,J2)' 


Prob(X  |r  i)Prob(X  |  j2) 
Prob(X) 


(8) 


where  c(s\fS2)  is  a  constant  independent  of  X.  Using  Equation  (8),  expert  (<i>i,w2) 
estimates  that 


P  (  X)  =  ^(^1,^2) 


■LVtxipffcx) 

*x 


(9) 


based  on  the  independence  assumption  (7),  where  kx  =  Prob(X).  Since  the  left  hand 
side  of  this  equation  should  sum  to  1  over  X,  we  have  that 
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x» 

unless,  of  course,  this  denominator  is  zero,  in  which  case  we  resort  to  setting 
P(w!  <i»2)“0-  Combining  (9)  and  (10)  gives  the  combination  formula  given  in  Defini¬ 
tion  3.  Thus,  we  have  shown  that  combination  in  M  is  a  form  of  Bayesian  updating 
of  pairs  of  experts,  based  on  an  independence  assumption. 

To  interpret  the  combination  formula  of  A/*'  in  a  Bayesian  fashion,  a  weaker 
independence  assumption  suffices.  The  combination  formula  can  be  restated  as: 

*(M1,w2)(x)  =  0  =  0  or  x$(k)  =  0. 

Using  Bayes’  formula,  and  assuming  that  all  prior  probabilities  are  nonzero,  it  suf¬ 
fices  to  show  that 

Prob(Ji,J2lM  =  0  Prob(ji|X)  =  0  or  Probf^lM  =  0. 

The  “if”  part  follows  since 

Prob(ri,J2l^)  =  Prob(^ i  |X)-Prob(r2  Ui  A) 

=  Prob(^2l^)  Prob(ri  |^2»M- 

The  “only  if”  part  becomes  our  independence  assumption,  and  is  equivalent  to 

Prob(ri|X)>0  and  Prob(j2 |X)>0  =>  Prob(si,52 |X)>0.  (11) 

This  assumption  is  implied  by  our  earlier  hypothesis  (7).  However,  assumption  (11) 
is  more  defensible,  and  is  actually  all  that  is  needed  to  regard  updating  in  the  space 
of  “boolean  opinions  of  experts,"  At' ,  as  Bayesian.  Since  the  Dempster/Shafer 
theory  deals  only  with  the  boolean  opinions.  Equation  (11)  is  the  required  indepen¬ 
dence  assumption. 

4.  Equivalence  with  the  Dempster/Shafer  Rule  of  Combination 

At  this  point,  we  have  four  spaces  with  binary  operations,  namely  (A/’,®), 
(A/",0),  (Ad',®'),  and  (Ad,®).  We  will  now  show  that  these  four  spaces  are 
closely  related.  It  is  not  hard  to  show  that  the  binary  operation  is,  in  all  four  cases, 
commutative  and  associative,  and  that  each  space  has  an  identity  element,  so  that 
these  spaces  are  abelian  monoids.  We  also  have 

Definition  5:  The  map  T 

T:A r~AT  , 

with  (€,p.,X)  =  T (S,\t.,P),  is  given  by  equation  (4),  i.e.,  xM(X)  =  1  iff  pM(X)> 0, 
and  xm(k)  =  0  otherwise.  ■ 

There  is  another  mapping  U,  given  by 


Definition  6: 


with  m  *  U(£,)i,X) 


U  :  Af'  -*  Ad' 

given  by  equation 


(5), 


i.e.. 
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m(A)  *  ii({co€£|x«  =  Xa»/M-({^).  ■ 

We  will  show  that  T  and  U  preserve  the  binary  operations.  More  formally,  we 
show  that  T  and  U  are  homomorphisms  of  monoids. 

Lemma  2:  T  is  a  homomorphism  from  Af  onto  A!' . 

Proof:  It  is  a  simple  matter  to  verify  that 

T(£i,Pi)  ©T(£2.P2)  -  im,?!)  ®(£a,P2». 

The  essential  point,  it  turns  out,  is  that  since  the  probabilistic  opinions  are  all  non¬ 
negative, 

pL^X)  •  pL2)(X)  >  0  iff  pW(A)>0andpl22>(X)>0. 

T  is  easily  seen  to  be  onto.  ■ 

Lemma  3:  U  is  a  homomorphism  of  AS'  onto  M  ’ . 

Proof:  Consider  (£,jjl.X)  =  (£i,M>i,Xi)  ©  (£ i.tLi.Xi)-  For  each  c»i€£i  and  a>2 € £2 , 
the  corresponding  and  xj§  are  characteristic  functions  of  subsets  of  A,  say  Xa 
and  Xc  respectively.  It  is  clear  that 

iff  BHC  «  A. 

Thus 

x(Uli<d2)  =  Xa  iff  x^  =  Xb  and  x£2)  =  Xc  where  BDC  -  A. 

So 

{(coi,o>2)€£  I  JC(«i,«2)  =  Xa}  =  U  -  Xa}x{o>2|xS^  =  Xc} 

anc-A 

Since  this  is  a  disjoint  union,  using  properties  of  measures,  this  gives 

M-{(“i,“2)€£  I  •*(«!, U2)  =  XA}  *  S  I  xl5-Xa}*M*a{«2€£2  I  *i22)=Xc}- 

anc=A 

We  can  divide  both  sides  of  this  equation  by  p,{£}  =  p> i{^i}’M>2(^2}  to  obtain 

m(A)  =  2  mi(B)m2(C), 

anc*A 

where  in  =  U(£,p.,X),  and  m,-  =  U(£;,p.,-,X,),  t  =  1,2.  Thus 

mSu*i.XiVi<£2,*2,Xi))  =  U(£1,m.1,X1)«,U(£2,^2.X7). 

which  is  to  say  that  U  is  a  homomorphism. 

Finally,  we  show  that  U  is  onto.  Recall  that  there  are  n  elements  in  A,  and  so 
there  are  2"  different  subsets  of  A.  For  a  given  mass  distribution  iniM’ ,  consider 
a  set  of  2"  experts  £,  with  each  expert  o>€£  giving  a  distinct  subset  r(o>)CA  as  the 
set  of  possibilities.  If  we  give  expert  u>  the  weight  p.{e>}  =  m(r(w)),  and  set 
Xu  =  Xr(u).  then  it  is  easy  to  see  that  in  =  U(£,p.,X).  ■ 

In  the  immediately  preceding  proof  that  U  is  onto,  we  assigned  weights  to 
experts.  This  is  the  only  place  were  we  absolutely  require  the  existence  of  differen¬ 
tial  weights  on  experts.  However,  if  we  content  ourselves  to  spaces  M'  and  M 
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containing  only  rational  values  for  the  mass  distribution  functions  (as,  for  example, 
is  the  case  in  any  computer  implementation),  then  the  weights  can  be  eliminated, 
and  replaced  by  counting  measure.  For  in  this  case,  given  a  rational  mass  distribu¬ 
tion  m,  we  multiply  by  a  common  multiple  of  the  denominators  of  the  fractions 
appearing  in  the  2"  values  of  the  mass  distribution  to  obtain  2"  integer  values  in 
proportion  to  the  given  m(A )  values.  We  then  construct  an  element  in  Af'  by  repli¬ 
cating  each  expert  the  appropriate  number  of  times,  given  by  the  integer  value 
corresponding  to  the  subset  AQA  designated  by  the  expert  as  the  subset  of  possibili¬ 
ties. 

Recall  from  Section  2  that  the  map  V:M' -*M  is  also  a  homomorphism.  So  we 
can  compose  the  homomorphisms  T:A/-*Af'  with  U \Af'-*M'  with  to  obtain 

the  following  obvious  theorem. 

Theorem:  The  map  VoUoTiA'-At  is  a  homomorphism  of  monoids  mapping  onto  the 
space  of  belief  states  ( M ,  ©) .  ■ 

This  theorem  provides  the  justification  for  the  viewpoint  that  the  theory  of  evidence 
space  M  represents  the  space  Af  via  the  representation  V°U°T.  The  proof  follows 
from  the  lemmas;  since  each  of  the  component  maps  in  this  representation  is  an  onto 
homomorphism,  the  composition  also  maps  homomorphically  onto  the  entire  theory 
of  evidence  space. 

The  significance  of  this  result  is  that  we  can  regard  combinations  of  elements  in 
the  theory  of  evidence  as  combinations  of  elements  in  the  space  of  opinions  of 
experts.  For  if  mi,  •  •  •  ,m*  are  elements  in  M  which  are  to  be  combined  under  ©, 
we  can  find  respective  preunages  in  Af  under  the  map  V°U°T,  and  then  combine 
those  elements  using  the  operation  <g>  in  the  space  of  opinions  of  experts  Af.  After 
all  combinations  in  Af  are  completed,  we  project  back  to  M  by  V°U°T;  the  result 
will  be  the  same  as  if  we  had  combined  the  elements  in  M .  The  only  advantage  to 
this  procedure  is  that  combinations  in  Af  are  conceptually  simpler:  there  are  no 
funny  normalizations,  and  we  can  regard  the  combination  as  Bayesian  updatings  on 
the  product  space  of  experts. 

5.  An  Alternative  Method  for  Combining  Evidence 

With  the  viewpoint  that  the  theory  of  evidence  is  really  simply  statistics  of 
opinions  of  experts,  we  can  make  certain  remarks  on  the  limitations  of  the  theory. 

(1)  There  is  no  use  of  probabilities  or  degrees  of  confidence.  Although  the  belief 
values  seem  to  give  weighted  results,  at  the  base  of  the  theory  experts  only  say 
whether  a  condition  is  possible  or  not.  In  particular,  the  theory  makes  no  dis¬ 
tinction  between  an  expert’s  opinion  that  a  label  is  likely  or  that  it  is  remotely 
possible. 

(2)  Pairs  of  experts  combine  opinions  in  a  Bayesian  fashion  with  independence 
assumptions  of  the  sources  of  evidence.  In  particular,  dependencies  in  the 
sources  of  information  are  not  taken  into  account. 

(3)  Combinations  take  place  over  the  product  space  of  experts.  It  might  be  more 
reasonable  to  have  a  single  set  of  experts  modifying  their  opinions  as  new 
information  comes  in,  instead  of  forming  the  set  of  all  committees  of  mixed 
pairs. 


Page  15 


e 


>!• 

'V 

T 


A  Statistical  Viewpoint  on  the  Theory  of  Evidence 


Both  the  second  and  third  limitations  come  about  due  to  the  desire  to  have  a 
combination  formula  which  factors  through  to  the  statistics  of  the  experts  and  is 
application-independent.  The  need  for  the  second  limitation,  the  independence 
assumption  on  the  sources  of  evidence,  is  well-known  (see,  e.g.,  [14]).  Without 
incorporating  much  more  complicated  models  of  judgements  under  multiple  sources 
of  knowledge,  we  can  hardly  expect  anything  better. 

The  first  objection,  however,  suggests  an  alternate  formulation  which  makes 
use  of  the  probabilistic  assessments  of  the  experts.  Basically,  the  idea  is  to  keep 
track  of  the  density  distributions  of  the  opinions  in  probability  space.  Of  course, 
complete  representation  of  the  distribution  would  amount  to  recording  the  full  set  of 
opinions  for  all  to.  Instead,  it  is  more  reasonable  to  approximate  the  distribution 
by  some  parameterization,  and  update  the  distribution  parameters  by  combination 
formulas. 


We  present  a  formulation  based  on  normal  distributions  of  logarithms  of  updat¬ 
ing  coefficients.  Other  formulations  are  possible.  In  marked  contrast  to  the 
Dempster /Shafer  formulation,  we  assume  that  all  opinions  of  all  experts  are  nonzero 
for  every  label.  That  is,  instead  of  converting  opinions  into  boolean  statements  by 
test  for  zero,  we  will  assume  that  all  the  values  are  nonzero,  and  model  the  distribu¬ 
tion  of  their  strengths. 


A 


simple  rewrite  of  Equation  (8)  of  Section  3.3  yields 

,  Prob(X|ri) 

Prob(X|ji,52)  -  c(ji,r2)  Ptob(X) — g-  — ^ 


Prob(X|j2) 

Prob(X) 


This  equation  depends  on  an  independence  assumption.  Equation  (7).  We  can 
iterate  this  equation  to  obtain  a  formula  for  Prob(X|jj,  •  •  •  ,sk).  In  this  iteration 
process,  and  j2  successively  take  the  place  of  JiA  •  •  •  Ar j  and  si+i  respectively, 
as  i  increases  from  1  to  k  —  1.  Accordingly,  we  require  a  sequence  of  independence 
assumptions,  which  will  take  the  form 


Prob(sI+1|j1A  •  •  Arf,X)  =  c(si,  •  •  •  ,jl  +  i)  Prob(r, |X) 
for  i  =  1,  •  •  •  ,k—  1.  Under  these  assumptions,  we  obtain 

Prob(X|ri,  •  ■  ■  ,sk)  =  c(s\,  ■ 


*  Prob(X|r,) 
,»)  Prob(M  n  pro„(-x) 


In  a  manner  similar  to  [29],  set 

L(X\si)  =  log 


Prob(X  |  jf) 
Prob(X) 


(Note,  incidentally,  that  these  values  are  not  the  so-called  “log-likelihood  ratios”;  in 
particular,  the  {.(Xls,)^  can  be  both  positive  and  negative).  We  then  obtain 

* 

log[Prob(X|51,  •  ■  •  ,**)]  =  c  +  log[Prob(X)]  +  2L(X|r,-), 

i-i 


where  c  is  a  constant  independent  of  X  (but  not  of  s\,  ■  ■  ■  ,sk). 

The  consequence  of  this  formula  is  that  if  the  independence  assumptions  hold, 
and  if  Prob(X)  and  L(X|.r()  are  known  for  all  X  and  i,  then  the  approximate  values 
Prob(X|5i,  •  •  ,sk)  can  be  calculated  from 
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Prob(X)exp[  2  L(X  |j,)] 

Prob(X|j lt  •  •  •  ,sk)  =  - ^ - .  (12) 

2Prob(X')exp[  £  L(X'  |j,)] 

X'  1  =  1 

Accordingly,  we  introduce  a  space  which  we  term  “logarithmic  opinions  of 
experts.”  For  convenience,  we  will  assume  that  experts  have  equal  weights.  An 
element  in  this  space  will  consist  of  a  set  of  experts  and  a  collection  of  opinions 
Yi  =  CvL° }u»€fj ■  Each  y^  is  a  map,  and  the  component  y^(X)  represents  expert  u>’s 

estimate  of  L(X|j4): 

y(J)  :  A  -  1R,  yL°(X)  *L(X|j,-). 

Note  that  the  experts  in  £,  all  have  knowledge  of  the  information  j4,  and  that  the 
estimated  logarithmic  coefficients  L(X|j,-)  can  be  positive  or  negative.  In  fact,  since 
the  experts  do  not  necessarily  have  precise  knowledge  of  the  value  of  Prob(X),  but 
instead  provide  estimates  of  log’s  of  ratios,  the  estimates  can  lie  in  an  unbounded 
range. 

In  analogy  with  our  map  to  a  statistical  space  (Section  3.2),  we  can  define  a 
space  which  might  be  termed  the  “parameterized  statistics  of  logarithmic  opinions  of 
experts.”  Elements  in  this  space  will  consist  of  pairs  ( u,C ),  where  u  is  in  1R"  and  C 
is  a  symmetric  n  by  n  matrix.  We  next  describe  how  to  project  from  the  space  of 
logarithmic  opinions  to  the  space  of  parameterized  statistics. 

Let  us  suppose  that  for  a  set  of  experts  S,  and  for  A  =  {Xi,  ■  •  •  ,X,,},  the  n- 
vectors  composed  of  the  logarithmic  opinions  y**€R",  ym  *  (y-(X1),  ■  •  •  ,y„(X,,)), 
are  approximately  (multi-)  normally  distributed.  Thus  we  model  the  distribution  of 
the  random  vector  y  =  (y(Xi),  •  •  •  ,y(X„))  by  the  density  function 

m(T> '  <2«)rtvasr“,<<7"i!)rc'1(?_S))’ 

where  u€lR"  is  the  mean  of  the  distribution,  and  C  is  the  n  by  n  covariance  matrix. 
That  is,  in  terms  of  the  expectation  operator  E{  }  on  random  variables  over  the  sam¬ 
ple  space  S, 

u  =  (u i,  •  •  •  ,uH), 

ui  *  E{y(X,)}. 

and  for  C  =  (cy), 

Cij  =  E{(y(X.)  -  Uj)  (y(\j)  -  uj)}. 

These  measurements  of  the  statistics  of  the  y(X)’s  can  be  made  regardless  of  the 
true  distributions.  The  accuracy  of  the  model  depends  on  the  degree  to  which  the 
multinormal  distribution  assumption  is  valid. 

Next  we  discuss  combination  formulas  in  both  spaces.  Suppose  (£,./,). 
i  *  1,2,  are  two  elements  in  the  space  of  logarithmic  opinions,  each  describing  a 
sample  space  of  experts  together  with  opinions.  Since  according  to  Equation  (12), 
the  logarithmic  opinions  add,  we  define  the  combination  of  the  two  elements  by 
(€,Y),  where 
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5-  S\  x  S2, 
y  ■  (>(-i  ,W2)}(M1,M2)tf> 

y(-i.-j)<x)  *  y-?(x)  +  y-i* (x>- 

To  consider  combinations  in  the  space  of  statistics,  let  mt(y)  be  the  density 
function  over  R"  for  the  random  vector  yU  over  the  sample  space  Si,  i  *  1,2. 
Assume  that  each  mj  is  a  multinormal  distribution,  associated  with  a  mean  vector 
u  and  a  covariance  C®.  In  order  that  the  projection  to  the  space  of  statistics  be  a 
homomorphism,  the  definition  of  combination  in  the  space  of  statistics  should 
respect  the  true  statistics  of  the  combined  opinions.  The  density  function  m(y)  for 
the  combination  y(»i,<*2).  («i,<U2)€£,  is  given  by 

m(y)  -  /  Tn\(y')mi{y-y')dy' . 

Notice  that  this  is  the  point  where  we  use  the  fact  that  the  logarithmic  opinions  add 
under  combination. 

Projecting  to  the  space  of  statistics,  we  discover  the  advantage  of  modeling  the 
distributions  by  normal  functions.  Namely,  since  the  convolution  of  a  Gaussian  by  a 
Gaussian  is  once  again  a  Gaussian,  we  define  the  combination  formula 

(«(1),C(1))  ®(S(2),C<2))  *  (S(1)+S(2),C(1)+C(2)). 

That  is,  since  m j  and  m2  are  multinormal  distributions,  their  convolution  is  also 
multinormal  with  mean  and  covariance  which  are  the  sums  of  the  contributing 
means  and  covariances.  (This  result  is  easily  proven  using  Fourier  transforms.)  An 
extension  to  the  case  where  Si  and  S2  have  nonequal  total  weights  is  straight¬ 
forward. 

Having  defined  combination  in  the  space  of  statistics,  one  must  show  that  the 
transformation  from  the  space  of  opinions  to  the  space  of  statistics  is  a  homomor¬ 
phism,  even  when  the  logarithmic  opinions  are  not  truly  normally-distributed.  This 
is  easily  done,  since  the  means  and  covariances  of  the  sum  of  two  random  vectors 
are  the  sums  of  the  means  and  covariances  of  the  two  random  vectors. 

To  interpret  a  state  (u,C)  in  the  space  of  parameterized  statistics,  we  must 
remember  the  origin  of  the  logarithmic-opinion  values.  Specifically,  after  k  updat¬ 
ing  iterations  combining  information  jj  through  s*,  the  updated  vector 
v  =  ( y  i ,  ,y„)€R"  is  an  estimate  of  the  sum  of  the  logarithmic  coefficients, 

yj  ~  2*-(xl*;) 

i=l 

According  to  Equation  (12),  the  a  posteriori  probabilities  can  then  be  calculated 
from  this  estimate  (providing  the  priors  Prob(\)’s  are  known).  In  particular,  the  a 
posteriori  probability  of  a  label  kj  is  high  if  the  corresponding  coefficient 
y;  -Hog[Prob(X;)]  is  large  in  comparison  to  the  other  components  y7  +  log[Prob(\y)]. 

Since  the  state  ( u,C )  represents  a  multinormal  distribution  in  the  log-updating 
space,  we  can  transform  this  distribution  to  a  density  function  for  a  posteriori  pro¬ 
babilities  Basically,  a  label  will  have  a  high  probability  if  u;  +  log[Prob(\,)]  is 
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relatively  large.  However,  the  components  of  u  represent  the  center  of  the  distribu¬ 
tion  (before  bias  by  the  priors).  The  spread  of  the  distribution  is  given  by  the 
covariance  matrix,  which  can  be  thought  of  as  defining  an  ellipsoid  in  1R"  centered 
at  u.  The  exact  equation  of  the  ellipse  can  be  written  implicitly  as: 

(y-u)TC~\y-u)  =  1. 

This  ellipse  describes  a  “one  sigma”  variation  in  the  distribution,  representing  a 
region  of  uncertainty  of  the  logarithmic  opinions;  the  distribution  to  two  standard 
deviations  lies  in  a  similar  but  enlarged  ellipse.  The  eigenvalues  of  C  give  the 
squared  lengths  of  the  semi-major  axes  of  the  ellipse,  and  are  accordingly  propor¬ 
tional  to  degrees  of  confidence.  The  eigenvectors  give  the  directions  in  which  the 
eigenvalues  measure  their  uncertainty.  Bias  by  the  prior  probabilities  simply  adds  a 
fixed  vector,  with  components  log[Prob(X.y)],  to  the  ellipse,  thereby  translating  the 
distribution.  We  seek  an  axis  j  such  that  the  components  y j  of  the  vectors  y  lying  in 
the  translated  ellipse  are  relatively  much  larger  than  other  components  of  vectors  in 
the  ellipse.  In  this  case,  the  preponderant  evidence  is  for  label  \j. 

Clearly,  the  combination  formula  is  extremely  simple.  Its  greatest  advantage 
over  the  Dempster/Shafer  theory  of  evidence  is  that  only  0(n2)  values  are  required 
to  describe  a  state,  as  opposed  to  the  2"  values  used  for  a  mass  distribution  in  M. 
The  simplicity  and  reduction  in  numbers  of  parameters  has  been  purchased  at  the 
expense  of  an  assumption  about  the  kinds  of  distributions  that  can  be  expected. 
However,  the  same  assumption  allows  us  to  track  probabilistic  opinions  (or  actually, 
the  logarithms),  instead  of  converting  all  opinions  into  boolean  statements  about 
possibilities. 

6.  Conclusions 

We  have  shown  how  the  theory  of  evidence  may  be  viewed  as  a  representation 
of  a  space  of  opinions  of  experts,  where  opinions  are  combined  in  a  Bayesian 
fashion  over  the  product  space  of  experts.  (Refer  to  Figure  2.)  By  “representa¬ 
tion”,  we  mean  something  very  specific  —  namely,  that  there  is  a  homomorphism 
mapping  from  the  space  of  opinions  of  experts  onto  the  Dempster/Shafer  theory  of 
evidence  space.  This  map  fails  to  be  an  isomorphism  (which  would  imply 
equivalence  of  the  spaces)  only  insofar  as  it  is  many-to-one.  That  is,  for  each  state 
in  the  theory  of  evidence,  there  is  a  collection  of  elements  in  the  space  of  opinions 
of  experts  which  all  map  to  the  single  state.  In  this  way  the  state  in  the  theory  of 
evidence  represents  the  corresponding  collection  of  elements.  In  fact,  what  this  col¬ 
lection  of  elements  have  in  common  is  that  the  statistics  of  the  opinions  of  the 
experts  defined  by  the  element  are  similar,  in  terms  of  the  way  statistics  are  meas¬ 
ured  by  the  map  U. 

Furthermore,  combination  in  the  space  of  opinions  of  experts,  as  defined  in 
Section  3,  leads  to  combination  in  the  theory  of  evidence  space.  This  allows  us  to 
implement  combination  in  a  somewhat  simpler  manner,  since  the  formulas  for  com¬ 
bination  without  the  normalization  are  simpler  than  the  more  standard  formulas, 
and  also  permits  us  to  view  combination  in  the  theory  of  evidence  space  as  the 
tracking  of  statistics  of  opinions  of  experts  as  they  combine  information  in  a  pair- 
wise  Bayesian  fashion  over  the  product  space  of  experts.  Applying  a  Bayesian 
interpretation  to  the  updating  of  the  opinions  of  experts  also  makes  clear  the  implicit 
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independence  assumptions  which  must  exist  in  order  to  combine  evidence  in  the 
prescribed  manner. 

From  this  viewpoint,  we  can  see  how  the  Dempster /Shafer  theory  of  evidence 
accomplishes  its  goals.  Degrees  of  support  for  a  proposition,  belief,  and  plausibili¬ 
ties,  are  all  measured  in  terms  of  joints  and  disjunctive  probabilities  over  a  set  of 
experts  who  are  naming  possible  labels  given  current  information.  The  problem  of 
ambiguous  knowledge  versus  uncertain  knowledge,  which  is  frequently  described  in 
terms  of  “withholding  belief,”  can  be  viewed  as  two  different  distributions  of  opin¬ 
ions.  In  particular,  ambiguous  knowledge  can  be  seen  as  observing  high  densities  of 
opinions  on  particular  disjoint  subsets,  whereas  uncertain  knowledge  corresponds  to 
unanimity  of  opinions,  where  the  agreed  upon  opinion  gives  many  possibilities. 
Finally,  instead  of  performing  Bayesian  updating,  a  set  of  values  are  updated  in  a 
Bayesian  fashion  over  the  product  space,  which  results  in  non-Bayesian  formulas 
over  the  space  of  labels. 

In  meeting  each  of  these  goals,  the  theory  of  evidence  invokes  compromises 
that  we  might  wish  to  change.  For  example,  in  order  to  track  statistics,  it  is  neces¬ 
sary  to  model  the  distribution  of  opinions.  If  these  opinions  are  probabilistic  assign¬ 
ments  over  the  set  of  labels,  then  the  distribution  function  will  be  too  complicated  to 
retain  precisely.  The  Dempster /Shafer  theory  of  evidence  solves  this  problem  by 
simplifying  the  opinions  to  boolean  decisions,  so  that  each  expert’s  opinion  lies  in  a 
space  having  2"  elements.  In  this  way,  the  full  set  of  statistics  can  be  specified  using 
2"  values.  We  have  suggested  an  alternate  method,  which  retains  the  probability 
values  in  the  opinions  without  converting  them  into  boolean  decisions,  and  requires 
only  0(n2)  values  to  model  the  distribution,  but  fails  to  retain  full  information 
about  the  distribution.  Instead,  our  method  attempts  to  approximate  the  distribution 
of  opinions  with  a  Gaussian  function. 
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